
您所在的位置:网站首页 cvpr workshop什么水平 CVPR2020图像匹配挑战赛,新数据集+新评测方法,SOTA正瑟瑟发抖!


2023-03-11 19:13| 来源: 网络整理| 查看: 265


从一系列图像中恢复物体的3D结构是计算机视觉研究中一个热门课题,这使得我们相隔万里就可以在google map中看到复活节岛的风景。这得益于图像采集自可控的条件(设备+环境等),使最终的重建效果的一致性和质量都很高;但这也限制了采集设备以及视角的差异。畅想一下,假如我们不使用专业设备,而是利用sfm技术根据互联网上大量的图片重建出这个复杂世界,那该多好呀!

为了加快这个领域的研究,更好地利用图像数据有效信息,谷歌联合 UVIC, CTU,EPFL发表了这篇文章 “Image Matching across Wide Baselines: From Paper to Practice”,旨在公布一种新的衡量特征匹配质量的标准模块以及数据集,这里的匹配是指2D图像间的匹配。该评价模块可以很方便地集成并评估现有流行的特征匹配算法,包括传统方法或者基于学习的方法。


图像特征匹配是计算机视觉的基础+核心问题之一,包括image retrieval [48] [7] [69] [91] [63], 3D reconstruction[3] [43] [79] [106],re-localization [74] [75] [51]以及 SLAM [61] [30] [31]等在内的诸多研究领域都会用到特征匹配。这个问题已经研究了几十年,但仍未被很好地解决。特征匹配面临的问题很多,主要包括以下挑战:视角,尺度,旋转,光照,遮挡以及相机渲染等。

近些年来,研究者开始将视线转移到端到端的学习方法(图像->位姿),但是这些方法甚至没有达到传统的方法(图像->匹配->BA优化)的性能。通常情况下,传统的方法将3D重建问题拆分成为2个子问题:特征匹配与位姿解算。解决每个子问题的新方法,诸如特征匹配/位姿解算,都使用了“临时指标”,但是单独地评价单个子问题的性能不足以说明整体性能。例如,一些研究仅在某个数据集上展现了相较于手工特征SIFT的优势(repeatablity或matching score),但是这些算法是否能够在真实应用中仍然展现出优势呢?我们通过后续实验说明传统算法经过调整之后也可匹敌现有的标称“sota”的算法(着实打脸)。


提出一个公开数据集。包括30k图像+深度图+真实位姿(posed image);模块化流水线处理流程,结合了数十种经典的和最新的特征提取和匹配以及姿态估计方法,以及多种启发式方法,可以分别交换和调整;两个下游任务,双目/多视角重建;全面研究了手工特征以及学习特征数十种方法和技术,以及它们的结合以及超参数选择的过程;相关工作局部特征

在引入SIFT特征之后,局部特征变成了主流。它的处理流程主要分为几个步骤:特征提取,旋转估计,描述子提取。除了SIFT,手工特征还有SURF [15], ORB [73], 以及 AKAZE [4]等。

现代描述子通常在SIFT关键点(即DoG)的预裁剪图像块上训练深度网络,其中包括:Deepdesc [82], TFeat [11], L2-Net [89], Hardnet [57], SOSNet [90]以及 LogPolarDesc [34](它们中绝大多数都是在同一个数据集上进行的训练)。

最近有一些工作利用了其它线索,诸如几何或全局上下文信息进行训练,其中包括GeoDesc [50] and ContextDesc [49]。

另外还有一些方法将特征点以及描述子进行单独训练,例如TILDE [95], TCDet [103], QuadNet [78], and Key.Net [13]。当前还有一些算法将二者联合起来训练,例如LIFT [99],DELF [63], SuperPoint [31], LF-Net [64], D2-Net [33],R2D2 [72]。


大基线的双目匹配的外点内点率可低至10%,甚至更低。要做匹配的话需要从中选择出能够解算出位姿的算法。常用的方式包括基于随机一致采样RANSAC的5-[62],7-[41],8-point[39]算法。它的改进算法包括local optimization [24], MLESAC [92], PROSAC [23], DEGENSAC [26], GC-RANSAC [12], MAGSAC [29],CNe (Context Networks) [100]+RANSAC,同样还有[70] [104] [85] [102]。作者最后加了一句“Despite their promise, it remains unclear how well they perform in real settings”(质疑中,哈哈)。


方法 [3] [43] [27] [37] [106],最流行的包括VisualSFM [98]以及COLMAP [79](作为真值)。



Oxford dataset [54], 48张图像+真值单应矩阵HPatches [9], 696张光照以及视角变化,无遮挡平面图像DTU [1], Edge Foci [107], Webcam [95], AMOS [67], 以及 Strecha [83]

上述数据集都有其限制:窄基线,真值噪声大,图像数量少。基于学习的描述子通常在[21]上进行训练,它们之所以比SIFT好的原因可能在于过拟合了(作者看到会不会脸红)。另外,用于导航/重定位以及slam的数据集包括Kitti [38], Aachen [76], Robotcar [52]以及CMU seasons [75] [8],但这些都并不包含Phototourism数据中的多种变换。

Phototourism 数据集

上述数据集这么“烂”,于是作者搞出了他们心目中理想的公开数据集——Phototourism 数据集。作者从[43] [88]中选择的25个受欢迎的地标集合(共30k),每个地标都有成百上千的图像。将它们缩减为最大尺寸为1024像素,并使用COLMAP [79]对其进行求解位姿以及点云和深度,通过建立好的模型去除遮挡物。






完全手工特征: SIFT [48] (以及RootSIFT [6]), SURF [15], ORB [73], AKAZE [4],FREAK [107]描述子+BRISK [108]特征点,使用OpenCV的实现,除了ORB特征,降低特征提取阈值以多提取一些特征;除此之外,也考虑VLFeat[94]中DoG的一些变种:(VL-)DoG, Hessian [16], Hessian-Laplace [55], Harris-Laplace [55], MSER [53]; 以及它们的仿射变种: DoG-Affine, Hessian-Affine [55] [14], DoG-AffNet [59], Hessian-AffNet [59]描述子从DoG特征学习得到的特征:L2-Net [89], Hardnet [57],Geodesc [50], SOSNet [90], ContextDesc [49], LogPolarDesc [34]端到端学习来的特征:Superpoint [31], LF-Net [64], and D2-Net [33]以及它们的多尺度变种:single- (SS) 以及 multi-scale (MS)特征匹配



Context Networks [100]+RANSAC[100] [85],简称CNe,效果如下:

Stereo task

给定图像以及,解算基础矩阵 ,除了现有的OpenCV[19]以及sklearn[65]中实现的RANSAC [36] [25],作者也用到了DEGENSAC [26], GC-RANSAC [12] and MAGSAC [29]。最后通过OpenCV的recoverPose()函数解算位姿。

Multi-view task

由于我们的目标是评价特征的好坏而不是SfM算法,作者从几个大场景中随机选择出图片构成几个小的数据集,称为"bags"。其中包含3/5张图像的各有100bags,10张图像的各有50bags,25张图像的各有25bags,总共275个bags。将外点滤除后的结果送入COLMAP [79]作为输入进行SfM重建。

误差指标mAA(mean Average Accuracy): Stereo task/Multi-view taskATE(Absolute Trajectory Error): Multi-view task实验配置


总体来说,MAGSAC表现最好,DEGENSAC表现次之。另外,作者提到“default settings can be woefully inadequate. For example, OpenCV sets τ = 0.99 and η = 3 pixels, which results in a mAP at 10° of 0.5292 on the validation set – a performance drop of 23.9% relative.” 所以在日常使用OpenCV的RANSAC函数时需要自己调整下超参数。







mAA指标上DoG特征点占据了Top的位置,其中SOSNet排名#1,紧随其后的是HardNet;‘HardNetAmos+’ [56],它在更多的数据(Brown [20], HPatches [9], AMOS [67])上进行了训练,但是效果却比不上在Brown的‘Liberty’上训练模型的效果;multi-view任务中,DoG+HardNet表现属于top水平,略优于ContextDesc, SOSNet,LogpolarDesc;R2D2是表现最好的端到端方法,同样在multi-view任务中表现较好(#6),但是在stereo任务中不如SIFT;D2-net表现并不太好,可能由于图像下采样造成了较差的定位误差;适当调整参数后的SIFT尤其是RootSIFT能够在stereo任务中排名#9,multi-view任务中排名#9,与所谓sota相差13.1%以及4.9%.(真为咱传统特征争气!)2k特征



Key.Net+HardNet获得最好的表现,第二名是LogPolarDesc;R2D2在stereo任务中排名#2,multi-view任务中排名#7;8k vs. 2k


基于DoG的方法容易受益于多个特征,而学习的方法收益于重新训练(该结论来自于Key.Net+Hardnet的组合,作者进行了重新训练,表现优异);整体来说基于学习的特征KeyNet, SuperPoint, R2D2, LF-Net在multi-view任务配置下比stereo任务配置下表现更好;(作者的假设是它们的鲁棒性好,但定位精度低);光照变化


新指标 vs. 传统指标


matching score的选择还是比较明智的,它似乎与mAA相关,但也很难保证高的匹配得分就一定有助于提升mAA,例如RootSIFT vs ContextDesc;repeatability则比较难去诠释它对最后位姿解算的效果。AKAZE的repeatability最好但是matching score和pose mAA都非常差,作者的原话(arxiv版本1)就是“descriptor may hurt its performance”;Key.Net获得最好的repeatability,但是在mAA指标上弱于DoG的方法,即使使用了相同的描述子HardNet;


由于目前本人正在使用SuperPoint特征,所以比较关注它的表现。感觉在2k特征阵营,它(SuperPoint (2k features, NMS=4), DEGENSAC)的表现并不好(屈居#35,目前共52个算法),然而SuperPoint + SuperGlue + DEGENSAC以及SuperPoint+GIFT+Graph Motion Coherence Network+DEGENSAC分别位列#1以及#2,这结果很让人欣慰!


[1]: H. Aanaes, A. L. Dahl, and K. Steenstrup Pedersen. Interesting Interest Points. IJCV, 97:18–35, 2012. 2

[2]: H. Aanaes and F. Kahl. Estimation of Deformable Structure and Motion. In Vision and Modelling of Dynamic Scenes Workshop, 2002. 6

[3]: S. Agarwal, N. Snavely, I. Simon, S.M. Seitz, and R. Szeliski. Building Rome in One Day. In ICCV, 2009. 1, 2

[4]: P. F. Alcantarilla, J. Nuevo, and A. Bartoli. Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces. In BMVC, 2013. 2, 3

[5]: Anonymous. DeepSFM: Structure From Motion Via Deep Bundle Adjustment. In Submission to ICLR, 2020. 2

[6]: Relja Arandjelovic. Three things everyone should know to improve object retrieval. In CVPR, 2012. 3

[7]: Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. In CVPR, 2016. 1

[8]: Hernan Badino, Daniel Huber, and Takeo Kanade. The CMU Visual Localization Data Set. http://3dvis., 2011. 2

[9]: V. Balntas, K. Lenc, A. Vedaldi, and K. Mikolajczyk. HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors. In CVPR, 2017. 2, 7

[10]: Vassileios Balntas, Shuda Li, and Victor Prisacariu. RelocNet: Continuous Metric Learning Relocalisation using Neural Nets. In The European Conference on Computer Vision (ECCV), September 2018. 1

[11]: V. Balntas, E. Riba, D. Ponsa, and K. Mikolajczyk. Learning Local Feature Descriptors with Triplets and Shallow Convolutional Neural Networks. In BMVC, 2016. 2

[12]: Daniel Barath and Ji Matas. Graph-cut ransac. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. 2, 4

[13]: Axel Barroso-Laguna, Edgar Riba, Daniel Ponsa, and Krystian Mikolajczyk. Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019. 2, 3

[14]: A. Baumberg. Reliable Feature Matching Across Widely Separated Views. In CVPR, pages 774–781, 2000. 3, 6

[15]: H. Bay, T. Tuytelaars, and L. Van Gool. SURF: Speeded Up Robust Features. In ECCV, 2006. 2, 3

[16]: P. R. Beaudet. Rotationally invariant image operators. In Proceedings of the 4th International Joint Conference on Pattern Recognition, pages 579–583, Kyoto, Japan, Nov. 1978. 3, 6

[17]: Jia-Wang Bian, Yu-Huan Wu, Ji Zhao, Yun Liu, Le Zhang, Ming-Ming Cheng, and Ian Reid. An Evaluation of Feature Matchers for Fundamental Matrix Estimation. In BMVC, 2019. 2

[18]: Eric Brachmann and Carsten Rother. Neural- Guided RANSAC: Learning Where to Sample Model Hypotheses. In ICCV, 2019. 2

[19]: G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000. 4

[20]: M. Brown, G. Hua, and S. Winder. Discriminative Learning of Local Image Descriptors. PAMI, 2011. 1, 2, 7

[21]: M. Brown and D. Lowe. Automatic Panoramic Image Stitching Using Invariant Features. IJCV, 74:59–73, 2007. 2

[22]: Mai Bui, Christoph Baur, Nassir Navab, Slobodan Ilic, and Shadi Albarqouni. Adversarial Networks for Camera Pose Regression and Refinement. In The IEEE International Conference on Computer Vision (ICCV) Workshops, Oct 2019. 1

[23]: Ondˇrej Chum and Jiˇr´ı Matas. Matching with PROSAC Progressive Sample Consensus. In CVPR, pages 220–226, June 2005. 2

[24]: Ondˇrej Chum, Jiˇr´ı Matas, and Josef Kittler. Locally Optimized RANSAC. In PR, 2003. 2

[25]: Ondˇrej Chum, Jiˇr´ı Matas, and Josef Kittler. Locally optimized ransac. In Pattern Recognition, 2003. 4

[26]: Ondrej Chum, Tomas Werner, and Jiri Matas. Two-View Geometry Estimation Unaffected by a Dominant Plane. In CVPR, 2005. 2, 4

[27]: Hainan Cui, Xiang Gao, Shuhan Shen, and Zhanyi Hu. Hsfm: Hybrid structure-from-motion. In CVPR, July 2017. 2

[28]: Zheng Dang, Kwang Moo Yi, Yinlin Hu, Fei Wang, Pascal Fua, and Mathieu Salzmann. Eigendecomposition-Free Training of Deep Networks with Zero Eigenvalue-Based Losses. In ECCV, 2018. 4

[29]: Jana Noskova Daniel Barath, Jiri Matas. MAGSAC: marginalizing sample consensus. In CVPR, 2019. 1, 2, 4

[30]: D. Detone, T. Malisiewicz, and A. Rabinovich. Toward Geometric Deep SLAM. arXiv preprint arXiv:1707.07410, 2017. 1

[31]: D. Detone, T. Malisiewicz, and A. Rabinovich. Superpoint: Self-Supervised Interest Point Detection and Description. CVPR Workshop on Deep Learning for Visual SLAM, 2018. 1, 2, 3, 8

[32]: J. Dong and S. Soatto. Domain-Size Pooling in Local Descriptors: DSP-SIFT. In CVPR, 2015. 6

[33]: M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler. D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. In CVPR, 2019. 1, 2, 3, 8

[34]: Patrick Ebel, Anastasiia Mishchuk, Kwang Moo Yi, Pascal Fua, and Eduard Trulls. Beyond Cartesian Representations for Local Descriptors. In ICCV, 2019. 2, 3, 6

[35]: Vassileios Balntas SILDa: A Multi-Task Dataset for Evaluating Visual Localization. https://github. com/scape-research/silda, 2018. 2

[36]: M.A Fischler and R.C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications ACM, 24(6):381–395, 1981. 1, 2, 4

[37]: P. Gay, V. Bansal, C. Rubino, and A. D. Bue. Probabilistic Structure from Motion with Objects (PSfMO). In ICCV, 2017. 2

[38]: Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In CVPR, 2012. 2

[39]: R.I. Hartley. In Defense of the Eight-Point Algorithm. PAMI, 19(6):580–593, June 1997. 2

[40]: R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2000. 1

[41]: R. I. Hartley. Projective reconstruction and invariants from multiple images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(10):1036–1041, Oct 1994. 1, 2

[42]: K. He, Y. Lu, and S. Sclaroff. Local Descriptors Optimized for Average Precision. In CVPR, 2018. 1

[43]: J. Heinly, J.L. Schoenberger, E. Dunn, and J-M. Frahm. Reconstructing the World in Six Days. In CVPR, 2015. 1, 2, 3

[44]: Karel Lenc and Varun Gulshan and Andrea Vedaldi. VLBenchmarks. benchmarks/, 2011. 2

[45]: A. Kendall, M. Grimes, and R. Cipolla. Posenet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization. In ICCV, pages 2938–2946, 2015. 1

[46]: J. Krishna Murthy, Ganesh Iyer, and Liam Paull. gradSLAM: Dense SLAM meets Automatic Differentiation. arXiv, 2019. 2

[47]: Zhengqi Li and Noah Snavely. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In CVPR, 2018. 2

[48]: David G. Lowe. Distinctive Image Features from ScaleInvariant Keypoints. IJCV, 20(2):91–110, November 2004. 1, 2, 3, 4, 6, 8, 15

[49]: Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, and Long Quan. ContextDesc: Local Descriptor Augmentation with Cross-Modality Context. In CVPR, 2019. 2, 3

[50]: Z. Luo, T. Shen, L. Zhou, S. Zhu, R. Zhang, Y. Yao, T. Fang, and L. Quan. Geodesc: Learning Local Descriptors by Integrating Geometry Constraints. In ECCV, 2018. 2, 3

[51]: Simon Lynen, Bernhard Zeisl, Dror Aiger, Michael Bosse, Joel Hesch, Marc Pollefeys, Roland Siegwart, and Torsten Sattler. Large-scale, real-time visual-inertial localization revisited. arXiv Preprint, 2019. 1

[52]: Will Maddern, Geoffrey Pascoe, Chris Linegar, and Paul Newman. 1 year, 1000 km: The Oxford RobotCar dataset. IJRR, 36(1):3–15, 2017. 2

[53]: J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust WideBaseline Stereo from Maximally Stable Extremal Regions. IVC, 22(10):761–767, 2004. 3, 6

[54]: K. Mikolajczyk and C. Schmid. A Performance Evaluation of Local Descriptors. PAMI, 27(10):1615–1630, 2004. 2

[55]: K. Mikolajczyk, C. Schmid, and A. Zisserman. Human Detection Based on a Probabilistic Assembly of Robust Part Detectors. In ECCV, pages 69–82, 2004. 3, 6

[56]: Jiri Matas Milan Pultar, Dmytro Mishkin. Leveraging Outdoor Webcams for Local Descriptor Learning. In Proceedings of CVWW 2019, 2019. 7

[57]: A. Mishchuk, D. Mishkin, F. Radenovic, and J. Matas. Working Hard to Know Your Neighbor’s Margins: Local Descriptor Learning Loss. In NeurIPS, 2017. 2, 3, 6

[58]: Dmytro Mishkin, Jiri Matas, and Michal Perdoch. MODS: PAMI, 19(6):580–593, June 1997. 2 Fast and robust method for two-view matching. CVIU, 2015. 6, 15

[59]: D. Mishkin, F. Radenovic, and J. Matas. Repeatability is Not Enough: Learning Affine Regions via Discriminability. In ECCV, 2018. 3, 6

[60]: Arun Mukundan, Giorgos Tolias, and Ondrej Chum. Explicit Spatial Encoding for Deep Local Descriptors. In CVPR, 2019. 1

[61]: R. Mur-Artal, J. Montiel, and J. Tardos. Orb-Slam: A Versatile and Accurate Monocular Slam System. IEEE Transactions on Robotics, 31(5):1147–1163, 2015. 1

[62]: D. Nister. An Efficient Solution to the Five-Point Relative Pose Problem. In CVPR, June 2003. 2

[63]: Hyeonwoo Noh, Andre Araujo, Jack Sim, and Tobias Weyanda nd Bohyung Han. Large-Scale Image Retrieval with Attentive Deep Local Features. In ICCV, 2017. 1, 2

[64]: Yuki Ono, Eduard Trulls, Pascal Fua, and Kwang Moo Yi. LF-Net: Learning Local Features from Images. In NeurIPS, 2018. 2, 3

[65]: F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. 4

[66]: Stephen M. Pizer, E. Philip Amburn, John D. Austin, Robert Cromartie, Ari Geselowitz, Trey Greer, Bart ter Haar Romeny, John B. Zimmerman, and Karel Zuiderveld. Adaptive histogram equalization and its variations. Computer vision, graphics, and image processing, 1987. 15

[67]: M. Pultar, D. Mishkin, and J. Matas. Leveraging Outdoor Webcams for Local Descriptor Learning. In Computer Vision Winter Workshop, 2019. 2, 7

[68]: C.R. Qi, H. Su, K. Mo, and L.J. Guibas. Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation. In CVPR, 2017. 4

[69]: Filip Radenovic, Georgios Tolias, and Ondra Chum. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In ECCV, 2016. 1

[70]: R. Ranftl and V. Koltun. Deep Fundamental Matrix Estimation. In ECCV, 2018. 2, 4

[71]: J. Revaud, P. Weinzaepfel, C. De Souza, N. Pion, G. Csurka, Y. Cabon, and M. Humenberger. R2D2: Repeatable and Reliable Detector and Descriptor. In arXiv Preprint, 2019. 8

[72]: J´erˆome Revaud, Philippe Weinzaepfel, C´esar Roberto de Souza, Noe Pion, Gabriela Csurka, Yohann Cabon, and Martin Humenberger. R2D2: Repeatable and Reliable Detector and Descriptor. In NeurIPS, 2019. 2

[73]: E. Rublee, V. Rabaud, K. Konolidge, and G. Bradski. ORB: An Efficient Alternative to SIFT or SURF. In ICCV, 2011. 2, 3, 6

[74]: Torsten Sattler, Bastian Leibe, and Leif Kobbelt. Improving Image-Based Localization by Active Correspondence Search. In ECCV, 2012. 1

[75]: T. Sattler, W. Maddern, C. Toft, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic, F. Kahl, and T. Pajdla. Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions. In CVPR, 2018. 1, 2

[76]: Torsten Sattler, Tobias Weyand, Bastian Leibe, and Leif Kobbelt. Image Retrieval for Image-Based Localization Revisited. In BMVC, 2012. 2

[77]: Torsten Sattler, Qunjie Zhou, Marc Pollefeys, and Laura Leal-Taixe. Understanding the Limitations of CNN-based Absolute Camera Pose Regression. In CVPR, 2019. 1

[78]: N. Savinov, A. Seki, L. Ladicky, T. Sattler, and M. Pollefeys. Quad-Networks: Unsupervised Learning to Rank for Interest Point Detection. CVPR, 2017. 2

[79]: J.L. Sch¨onberger and J.M. Frahm. Structure-From-Motion Revisited. In CVPR, 2016. 1, 2, 3, 4, 6

[80]: J.L. Sch¨onberger, H. Hardmeier, T. Sattler, and M. Pollefeys. Comparative Evaluation of Hand-Crafted and Learned Local Features. In CVPR, 2017. 2

[81]: Yunxiao Shi, Jing Zhu, Yi Fang, Kuochin Lien, and Junli Gu. Self-Supervised Learning of Depth and Ego-motion with Differentiable Bundle Adjustment. arXiv Preprint, 2019. 2

[82]: E. Simo-serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-Noguer. Discriminative Learning of Deep Convolutional Feature Point Descriptors. In ICCV, 2015. 2

[83]: C. Strecha, W.V. Hansen, L. Van Gool, P. Fua, and U. Thoennessen. On Benchmarking Camera Calibration and Multi-View Stereo for High Resolution Imagery. In CVPR, 2008. 2

[84]: J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A Benchmark for the Evaluation of RGB-D SLAM Systems. In IROS, 2012. 4

[85]: Weiwei Sun, Wei Jiang, Eduard Trulls, Andrea Tagliasacchi, and Kwang Moo Yi. Attentive Context Normalization for Robust Permutation-Equivariant Learning. In arXiv Preprint, 2019. 2, 4, 8

[86]: Chengzhou Tang and Ping Tan. Ba-Net: Dense Bundle Adjustment Network. In ICLR, 2019. 2

[87]: Keisuke Tateno, Federico Tombari, Iro Laina, and Nassir Navab. Cnn-slam: Real-time dense monocular slam with learned depth prediction. In CVPR, July 2017. 2

[88]: B. Thomee, D.A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L. Li. YFCC100M: the New Data in Multimedia Research. In CACM, 2016. 3

[89]: Y. Tian, B. Fan, and F. Wu. L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space. In CVPR, 2017. 2, 3

[90]: Yurun Tian, Xin Yu, Bin Fan, Fuchao Wu, Huub Heijnen, and Vassileios Balntas. SOSNet: Second Order Similarity Regularization for Local Descriptor Learning. In CVPR, 2019. 1, 2, 3

[91]: Giorgos Tolias, Yannis Avrithis, and Herv´e J´egou. Image Search with Selective Match Kernels: Aggregation Across Single and Multiple Images. IJCV, 116(3):247–261, Feb 2016. 1

[92]: P.H.S. Torr and A. Zisserman. MLESAC: A New Robust Estimator with Application to Estimating Image Geometry. CVIU, 78:138–156, 2000. 2

[93]: B. Triggs, P. Mclauchlan, R. Hartley, and A. Fitzgibbon. Bundle Adjustment – A Modern Synthesis. In Vision Algorithms: Theory and Practice, pages 298–372, 2000. 1

[94]: Andrea Vedaldi and Brian Fulkerson. Vlfeat: An open and portable library of computer vision algorithms. In Proceedings of the 18th ACM International Conference on Multimedia, MM ’10, pages 1469–1472, 2010. 3

[95]: Y. Verdie, K. M. Yi, P. Fua, and V. Lepetit. TILDE: A Temporally Invariant Learned DEtector. In CVPR, 2015. 2

[96]: S. Vijayanarasimhan, S. Ricco, C. Schmid, R. Sukthankar, and K. Fragkiadaki. Sfm-Net: Learning of Structure and Motion from Video. arXiv Preprint, 2017. 2

[97]: X. Wei, Y. Zhang, Y. Gong, and N. Zheng. Kernelized Subspace Pooling for Deep Local Descriptors. In CVPR, 2018. 1

[98]: Changchang Wu. Towards Linear-Time Incremental Structure from Motion. In 3DV, 2013. 2, 6

[99]: Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. LIFT: Learned Invariant Feature Transform. In ECCV, 2016. 2

[100]: K. M. Yi, E. Trulls, Y. Ono, V. Lepetit, M. Salzmann, and P. Fua. Learning to Find Good Correspondences. In CVPR, 2018. 2, 3, 4, 7, 13, 17

[101]: S. Zagoruyko and N. Komodakis. Learning to Compare Image Patches via Convolutional Neural Networks. In CVPR, 2015. 6

[102]: Jiahui Zhang, Dawei Sun, Zixin Luo, Anbang Yao, Lei Zhou, Tianwei Shen, Yurong Chen, Long Quan, and Hongen Liao. Learning Two-View Correspondences and Geometry Using Order-Aware Network. ICCV, 2019. 2, 3, 4

[103]: Xu Zhang, Felix X. Yu, Svebor Karaman, and Shih-Fu Chang. Learning Discriminative and Transformation Covariant Local Feature Detectors. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. 2

[104]: Chen Zhao, Zhiguo Cao, Chi Li, Xin Li, and Jiaqi Yang. NM-Net: Mining Reliable Neighbors for Robust Feature Correspondences. In CVPR, 2019. 2, 4

[105]: Qunjie Zhou, Torsten Sattler, Marc Pollefeys, and Laura Leal-Taixe. To learn or not to learn: Visual localization from essential matrices. arXiv Preprint, 2019. 1

[106]: Siyu Zhu, Runze Zhang, Lei Zhou, Tianwei Shen, Tian Fang, Ping Tan, and Long Quan. Very Large-Scale Global SfM by Distributed Motion Averaging. In CVPR, June 2018. 1, 2

[107]: C.L. Zitnick and K. Ramnath. Edge Foci Interest Points. In ICCV, 2011. 2 [108]: A. Alahi, R. Ortiz, and P. Vandergheynst. FREAK: Fast Retina Keypoint. In CVPR, 2012. 7, 11

[109]: S. Leutenegger, M. Chli, and R. Y. Siegwart. Brisk: Binary robust invariant scalable keypoints. In ICCV, pages 2548–2555, 2011.7





CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3